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Abstract 



A machine translation system is said to be complete if all expressions that are correct 
according to the source-language grammar can be translated into the target language. 
This paper addresses the completeness issue for compositional machine translation in 
general, and for compositional machine translation of context-free grammars in par- 
ticular. Conditions that guarantee translation completeness of context-free grammars 
are presented. 



O 

> ■ 1 Introduction 



Systems for translation of controlled language^ require the source text to be expressed 
within severe syntactical and lexical limits. One of the objectives of such systems is that 
an author who fully conforms to the imposed restrictions is rewarded with a reliable and 
fully automatic translation of his text into one or more target languages. Therefore a proof 
of their completeness is of great importance. A machine translation system is said to be 
complete if all expressions that are correct according to the source-language grammar can 
be translated into the target language. 

The starting point of this research has been the compositional approach to machine trans- 



lation developed in the Rosetta project, | Rosetta 1994 1. An important difference is that 



Rosetta made use of a rather complex grammar formalism, M-grammars, for which com- 
pleteness could not be proven, whereas the current research focuses on the provability of 
completeness for relatively simple grammar formalisms, which may be more appropriate 
for machine translation of controlled languages. 



*The research presented here is part of my PhD-project on the completeness of compositional machine 
translation. In this PhD-project I address the completeness issue for several grammar formalisms, describ- 
ing and comparing them in terms of an abstract, algebraic formulation of compositional grammars and 
compositional translation. This paper is restricted mainly to the context-free grammar formalism 



1 For more information on controlled language, see tittp : //wwwots . let . rTiu.nl/Controlled-languages/ 
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First sections g and || describe our view and definitions of respectively compositional 
grammar and compositional machine translation. Section || presents the theme of this pa- 
per, viz. completeness of compositional machine translation. Subsequently section [5| works 
out completeness conditions for compositional grammars based on context-free grammars. 
These conditions are rather restrictive and may therefore find application primarily in 
areas such as controlled languages. One of the objectives of ongoing research is to relax 
the conditions. Section ^ concludes the paper and discusses ongoing and future research. 



2 Compositional Grammars 

This section defines compositional grammars (subsection ^]l|), and the auxiliary notions 
syntactic derivation tree (subsection 2.2) and semantic derivation tree (subsection p.3|). 



2.1 A Definition of Compositional Grammars 

Compositional machine translation assumes that the source language (SL) and the target 
language (TL) are defined by means of compositional grammars, i.e. grammars that obey 



the well-known compositionality principle (cf. [Partee et al. 1993, Janssen 1986, Gamut 1991 
p.315ff]). Abstracting away from the details of any specific syntactic formalism, we de- 
fine a compositional grammar G as consisting of (i) a syntactic component, (ii) a semantic 
component, and (Hi) an interpretation from the syntactic component to the semantic com- 
ponent (cf. Montague's Universal Grammar, [fThomason 1974fl ). Roughly, the syntactic 
component consists of a set of basic expressions (words) , each having a syntactic category, 
and a set of syntactic rules that build larger expressions from basic expressions. Likewise, 
the semantic component consists of a set of basic meanings, each having a semantic cat- 
egory, and a set of semantic rules that build larger meanings from basic meanings. The 
interpretation associates with every basic expression a set of basic meanings, and with 
every syntactic rules a set of semantic rules. 

There now follows a more detailed description of these components, which the eager reader 
may wish to skip on a first pass. 



• The syntactic component specifies a finite set of basic expressions BE, a finite set of 
syntactic rules SynR, a finite set of syntactic categories SynCats, and a syntactic type- 
assignment function SynType(-) . Basic expressions are, roughly, the smallest meaningful 
units in a language (more or less the stems of content words). Syntactic rules are operations 
that recursively build derived expressions from basic expressions. Syntactic categories 
describe the syntactic properties of expressions. Basic expressions b all have a syntactic 
category SynCat(b); syntactic rules restrict their arguments in their categories, and specify 
the category of the derived expression they yield. The syntactic type- assignment function 
associates every syntactic rule R with a 2-tuple SynType(R) consisting of a so-called 
argument list SynAL(R) of the categories of its arguments and its resultant category. 
The arity arity(R) of a syntactic rule is the number of categories in the rule's argument 
list. We require that all syntactic and semantic rules are total: They must be applicable 
for any combination of arguments that matches their argument lists. Note that this is not 
a real restriction of expressiveness: Any partial function can be made into a total function 
by an appropriate tuning of the set of categories. 
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• The semantic component has the same structure as the syntactic component: It specifies 
a finite set of basic meanings BM, a finite set of semantic rules SemR, a finite set of 
semantic categories SemCats, and a semantic type-assignment function SemType(-). Basic 
meanings are expressions of the semantic domain of some logical language. Semantic rules 
are operations in the logical language that recursively build derived meanings from basic 
meanings. For the purpose of compositional translation the choice of this logical language 
is not very important. However, the semantic rules must be total. Semantic categories 
describe the semantic properties of semantic expressions. Basic meanings m all have a 
semantic category SemCat(m); semantic rules restrict their arguments in their semantic 
categories, and specify the category of the derived meaning they yield. The semantic 
type- assignment function associates every semantic rule M with a 2-tuple SemType(M) 
consisting of a so-called argument list SemAL(M) of the categories of its arguments and 
its resultant category. The arity arity(M) of a semantic rule is the number of categories 
in the rule's argument list. 

• The interpretation, denoted [.], associates every basic expression with a set of basic 
meanings, and every syntactic rule with a set of semantic rules. The arities of associated 
syntactic and semantic rules must match. Note that our approach differs here from Mon- 
tague grammar, in which a basic expression (syntactic rule) is associated with exactly one 
basic meaning (semantic rule). 

2.2 Syntactic Derivation Trees 

Derivational histories of syntactic expressions are represented using so-called syntactic 
derivation trees: 

Definition 1 Syntactic Derivation Tree 

A syntactic derivation tree t is either a tree consisting of a single node b, where b is the 
name of a basic expression, or a tree of the form R[t±, . . . ,t n ], where R is the name of a 
syntactic rule, and t±, . . . , t n is an ordered list of syntactic derivation trees. 
We define the syntactic category of a syntactic derivation tree t, denoted SynCat(t), to 
be the resultant category of its top syntactic rule. For convenience, we will sometimes 
annotate syntactic derivation trees with their syntactic category, e.g. t : C. 

Intuitively one may think of a syntactic derivation tree as the derivational history of a 
syntactic expression. However, not all syntactic derivation trees actually describe expres- 
sions: The definition given above does not require the syntactic rules to be applicable to 
their arguments. This distinction is described by the concept of well-formedness. 

Definition 2 Well-Formedness of Syntactic Derivation Trees 

A syntactic derivation tree t is well-formed if and only if it consists of a single basic ex- 
pression or otherwise if all the syntactic rules in the tree are applicable to their arguments 
as specified by tree t, i.e. if and only if for all the syntactic rules in tree t (i) the number 
of arguments (subtrees) matches the rule's arity, and (ii) the arguments satisfy any con- 
ditions on the syntactic categories that may be made by the syntactic rule. 
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Since there is generally more than one way to derive an expression, expressions are in 
general assigned a set of corresponding syntactic derivation trees. 

2.3 Semantic Derivation Trees 

The meaning of a derived expression is derived in parallel with the syntactic derivation 
process. Thus this semantic derivation process may be represented in a tree with the same 
geometry as the syntactic derivation tree, but labelled by basic meanings and semantic 
rules. This tree is called a semantic derivation tree. 

Definition 3 Semantic Derivation Tree 

A semantic derivation tree d is either a tree consisting of a single node m, where m is the 
name of a basic meaning, or a tree of the form M[d±, . . . , d n ], where M is the name of a 
semantic rule, and d±, . . . ,d n is an ordered list of semantic derivation trees. 
We define the semantic category of a semantic derivation tree d, denoted SemCat(d), to 
be the resultant category of its top semantic rule. Semantic derivation trees may also be 
annotated with their semantic category, e.g. d : C. 

Since every syntactic derivation tree is associated with a set of semantic derivation trees, 
every syntactic derivation tree is associated with a set of semantic derivation trees. 
A semantic derivation tree is well- formed if and only if there is a corresponding well- formed 
syntactic derivation tree. 

3 Compositional Machine Translation 

In our definition of compositional translation the semantic component is used as an inter- 
lingua: Source- and target-language expressions are translation- equivalent if and only if 
they have at least one well-formed semantic derivation tree in common. 

Definition 4 Compositional Translation 

For two compositional grammars G and G', the compositional translation of a source- 
language utterance e is a set of target-language utterances, derived as follows: 

Analysis Generation 

Source-Language Utterance Target-Language Utterances 

morphosyntactic analysis ( 1 :n) morphosyntactic generation (1:1) 

SL Syntactic Derivation Trees TL Syntactic Derivation Trees 




SL/TL Semantic Derivation Trees 



Fig.l. The Process of Compositional Translation. 

• Morphosyntactic Analysis - Morphosyntactic analysis performs morphological and 
syntactic analysis of a SL utterance, yielding the set of all syntactic derivation trees that 
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correspond to the utterance: 

mor synan{e) = {b \ b = e, b G BE} 

U {R[t\, . . . ,t n ] | e = R(ei, . . . , e n ),\/i (1 < i < n) t,i G morsynan(ei), R G SynR} 
('i?(ei, . . . , e n )' denotes the result of applying rule R to expressions e%, . . . , e n ). 

• Semantic Analysis - Semantic analysis of a syntactic derivation tree yields the set of 
all corresponding semantic derivation trees: 

seman(b) = [6] 

seman(R[ti, . . . ,t n ]) = {M[di, . . . , d n ] \ M G {Rj A Mi (1 < i < n) di G seman(ti)} 

• Semantic Generation - Semantic generation from a semantic derivation tree yields 
the set of all corresponding syntactic derivation trees: 

semgen(m) = {b | m G [[6]} 

semgen(M[di, . .. ,d n ]) = {R[h, . . . , t n ] \ M G {Rj A Vi (1 < % < n) U G semgen(di)} 

• Morphosyntactic Generation - Morphosyntactic generation for a well-formed syn- 
tactic derivation tree produces the corresponding utterance: 

morsyngen(b) = b 

morsyngen{R\t\, . . . , t n ]) = R(ei, . . . , e n ), where Vi (1 < i < n) G vnor syngeniti) 



4 Completeness of Machine Translation 



An important question regarding the reliability of compositional translation is what we call 
the completeness^ issue: Can the translation process be guaranteed to produce at least one 
translation? In subsection 4.1 we first make this notion of completeness precise. Then, in 



subsection 42|, we investigate what conditions must be satisfied to guarantee completeness. 
In section || conditions are elaborated for compositional grammars based on context-free 
grammars. 



4.1 Three Levels of Completeness 

Completeness is about the guaranteed generation of well-formed translations, given a spe- 
cific SL and TL grammar, and translation process. However, this description does not 
make precise from which stage on the translation process must be guaranteed to succeed. 
Depending on this, one may distinguish (at least) three levels of completeness (cf. fig. 1): 

• Utterance Completeness - For each well-formed SL utterance, the translation pro- 
cess yields at least one well-formed TL utterance. 

• Syntactic Completeness - For each syntactic derivation tree of each well-formed SL 
utterance, the translation process yields at least one well-formed TL utterance. 

• Semantic Completeness - For each semantic derivation tree of each syntactic deriva- 
tion tree of each well-formed SL utterance, the translation process yields at least one 
well-formed TL utterance. 

Note: Semantic completeness subsumes syntactic completeness, which in turn subsumes 
utterance completeness. 



Naively, one would like a machine translation system to produce at least one translation 
for every SL utterance. This requirement is included in the definition of utterance com- 



2 The term 'completeness' was taken from [Whitelock 1994, p p. 342-3431 
completeness is known as 'strict isomorphism', and is discussed in 



In the Rosetta framework 



Landsbergen 19871 and jRosetta 1994 1. 
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pleteness above. However, it is well-known that natural-language utterances are often 
ambiguous. For each of its interpretations, such an ambiguous utterance may have a dif- 
ferent translation. Therefore, a machine translation system should be able to provide at 
least one translation for each of the interpretations of the SL utterance. Natural-language 
ambiguity takes on two forms: structural ambiguity and lexical ambiguity. The notion of 
syntactic completeness takes care of the structural ambiguity: It is formulated in terms 
of structurally unambiguous syntactic derivation trees. However, syntactic completeness 
is still unsatisfactory, as syntactic derivation trees are often lexically ambiguous. This is 
due to the fact that basic expressions may have more than one meaning, and syntactic 
rules may have more than one semantic rule associated with them. What is needed is a 
formulation of completeness in terms of a structure that is both structurally and lexically 
unambiguous. The solution is provided by the notion of semantic completeness. There- 
fore, from now on, the term 'completeness' will be taken to refer to semantic completeness 
only. 

Definition 5 Completeness 

For a pair of compositional grammars (G,G f ), compositional translation from G to G' 
is complete if and only if for each well-formed semantic derivation tree, the translation 
process yields at least one well-formed TL utterance. 



4.2 Guaranteeing Completeness 

The central issue of this paper is the question of how to guarantee completeness. Or stated 
in terms of the process of compositional translation described above: What conditions on 
the SL and TL grammars are sufficient (and necessary) to guarantee that, after success- 
ful analysis, generation can produce a well-formed TL expression? Generation comprises 
semantic generation and morphosyntactic generation (cf. fig. 1). 

Completeness of Morphosyntactic Generation Morphosyntactic generation evalu- 
ates the syntactic derivation trees yielded by semantic generation by recursive rule appli- 
cation. As stated in section [| we assume that all syntactic rules are total for the categories 
of their arguments. Rule application therefore succeeds if and only if the arguments are 
of the correct categories. To ensure this we must move upstream to semantic generation. 



Completeness of Semantic Generation - Semantic generation simply replaces the 
basic meanings and semantic rules in the semantic derivation tree with corresponding syn- 
tactic elements of the TL grammar, forming the TL syntactic derivation trees. An obvious 
necessary and sufficient condition for completeness of semantic generation is that there 
be at least one translation-equivalent counterpart in the TL grammar for each possible 
semantic element in the SL semantic derivation trees. A compositional grammar pair 
satisfying this condition is called a homomorphic grammar pair (see also [Rosetta 1994, 
p.368]): 

Definition 6 Grammar Homomorphism 

A compositional grammar pair {G, G') is homomorphic from G to G' if and only if G' is 
attuned to G: 
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i. For each SL basic expression b, for each of the basic meanings m of b, there is 
at least one TL basic expression b' such that basic meaning m is also a basic 
meaning of b'. Formally, V6 G BE Mm G {bj 3b' G BE m G {b'j. 

ii. For each SL syntactic rule R, for each of the semantic rules M of R, there is 
at least one TL syntactic rule R' such that semantic rule M is also a semantic 
rule of R'. Formally, Vi? G SynR VM G \R] 3R' G SynR M G [E'J. 

However, to demand grammar homomorphism is only a necessary condition for complete- 
ness, and not a sufficient one. It merely guarantees that for every well-formed SL semantic 
derivation tree there is a corresponding TL syntactic derivation tree, and does not guar- 
antee that this syntactic derivation tree is well-formed. The next section is about such 
sufficient conditions for context-free grammars. 

5 Completeness for CFG-Based Compositional Grammars 

This section presents completeness conditions for translation between compositional gram- 
mars based on the context-free grammar (CFG) formalism. We assume that the reader 
is familiar with this formalism. Subsection |5.1| explicates how a compositional grammar 
can be based on context-free grammars. Subsections [5^ and |fO| subsequently develop 
completeness conditions for such compositional grammars. 

5.1 CFG-Based Compositional Grammar 

A compositional grammar consists of a syntactic component with basic expressions and 
syntactic rules, a semantic component with basic meanings and semantic rules, and an 
interpretation from the syntactic component to the semantic component. Here we model 
the syntactic component as a CFG. The semantic component and the interpretation are 
as defined above. 

In the syntactic component we let basic expressions correspond to rewrite rules that do not 
have right-hand side (RHS) nonterminals. The rule's RHS corresponds to the lexical ma- 
terial of the basic expression; the rule's left-hand side (LHS) nonterminal corresponds to 
the syntactic category of the basic expression. We let syntactic rules correspond to rewrite 
rules that do have RHS nonterminals. The type of a syntactic rule is a 2-tuple consisting 
of a list of categories of the arguments it expects and the category of the expression it 
produces. The list of categories corresponds to an ordered list of the rewrite rule's RHS 
nonterminals; the resultant category corresponds to the rewrite rule's LHS nonterminal. 
The operation performed by the syntactic rule is the in-order concatenation of its RHS 
terminals and nonterminals, where the nonterminals are replaced with the lexical material 
of the expressions which are provided as arguments to the syntactic rule. An example 
illustrates this: 

Example CFG-Based Compositional Grammars 

In this example we briefly illustrate CFG-based compositional grammars. Consider the 
following table, which shows the syntactic component of a CFG-based compositional gram- 
mar and its interpretation in the semantic component. 
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CFG Rewrite Rule Syntactic Rule Basic Expression Interpretation 



Name : Type Name : Category 



A- 


^B C 


Ri 


((B,C),A) 


{Mi} 


A- 


+ a B d 


R2 


((B), A) 


{M 2a ,M 2b } 


A- 


+ eC B 


Rs 


((B,C),A) 


{M 3a ,M 3b } 


B- 






b : B 


{mi} 


C- 


-> c 




c : C 


{m 2a ,m 2b } 



Observe that the order of syntactic categories in the argument list need not be the same as 
the order in the rewrite rules (see R\, -R3). Syntactic rules R± and R3 have two arguments. 
As a consequence semantic rules Mi, M 3a and M 3b are binary operators. Syntactic rule 
R 2 and semantic rules M 2a and M 2b have one argument. 
The notion of well-formedness can be made more precise now: 

Definition 7 CFG-well-formedness 

A CFG syntactic derivation tree t is CFG-well-formed if and only if it is either the name 
of a basic expression, or a tree of the form R[t%, . . . , t n ], such that (i) rule i?'s argu- 
ment list matches the list of syntactic categories of the subtrees t\, . . . ,t n : SynAL(R) = 
(SynCat(ti), . . . , SynCat{t n )) , and (ii) subtrees t%, . . . , t n are CFG-well-formed. 



What about the 'translation power' of CFG-based compositional grammars? The com- 
positional translation method described in section || demands that basic expressions of 
the source language correspond to basic expressions in the target language, and that the 
syntactic rules of the source-language correspond to syntactic rules of the target lan- 
guage with the same arity. This restricts the translation power considerably. The main 
degrees of freedom in the translation relation are the following. In the syntactic rules, 
the nonterminals need not occur in the same order as in the argument list. This allows 
translation-equivalent rules to describe word-order differences between languages. Syn- 
tactic rules may also introduce lexical material other than that of the arguments. This is 
called syncategorematic introduction (cf. syntactic rules R 2 and R3 in the example above, 
where basic expressions a, d and e are left out). The third degree of freedom relates to 
the correspondence between categories of source- and target-language grammars. 
Subsection 5^ now develops a completeness condition for CFG-based compositional gram- 
mars. Subsection |5.3| then shows that this condition is rather restrictive and presents a 
way to relax it. 



5.2 CFG Completeness for Many-to-One Category Correspondence 

In this section we show how a restriction of the correspondence between syntactic and 
semantic categories of the target language can lead to completeness. First we formally 
define a restriction of this correspondence. 



Definition 8 N-l Category Correspondence 

There is an N-l category correspondence between a semantic component and a syn- 
tactic component of a compositional grammar if and only if there is a function / : 
SemCats — > SynCats such that: 



S 



• Vm G BM VbeBE m G [6] =► SynCat{b) = f{SemCat{m)) 

• VM G SemE Vi? € 5yni? 

(M G [R] A SemType(M) = (( Cl , . . . , c n ), c)) SynType(R) = «/( Cl ), . . . , /(c„)), /(c)) 

The restriction of compositional grammars to such an N-l category correspondence to- 
gether with the grammar homomorphism condition gives us completeness: 

Theorem 1 CFG Completeness for Many-to- One Category Correspondence 

For any CFG-based compositional grammar pair (G, G'), compositional translation from G 
to G' is complete if (i) the grammar pair is homomorphic from G to G', and (ii) there 
is an N-l category correspondence between the semantic and the syntactic categories of G' . 

Proof: 

As we are concerned with semantic completeness, we have to prove that for every gram- 
matical SL utterance, for every one of its well-formed semantic derivation trees, there 
exists at least one grammatical TL utterance. As we assume it to be trivial that mor- 
phosyntactic generation succeeds for CFG-well-formed syntactic derivation trees, we focus 
on semantic generation. We must show that every well-formed semantic derivation tree 
always yields at least one CFG-well-formed TL syntactic derivation tree. We do this by 
induction on the depth of the semantic derivation trees. 

Induction Base A semantic derivation tree of depth 1 is a basic meaning. Homomorphism 
from G to G' guarantees that there is at least one TL basic expression that is associated 
with that basic meaning. Basic expressions are trivially CFG-well-formed syntactic deriva- 
tion trees. 

Induction Hypothesis For every well-formed semantic derivation tree derivable in G which 
is of depth m or less, compositional translation yields at least one CFG-well-formed TL 
syntactic derivation tree in G' . 

Induction Step Assuming the induction hypothesis holds for arbitrary depth m, we must 
prove that it also holds for depth m + 1. Every well-formed semantic derivation tree of 
depth m+1 is of the form M[d±, . . . , d n ] : A, where each subtree d\ is of the form Mj[. . .] : A{ 
(see fig. 2 below). Because of the given well-formedness of the semantic derivation tree we 
know that M is applicable to its arguments, so that its argument list (A\, . . . , A n ) matches 
the semantic categories of the arguments A{. Homomorphism guarantees that M has at 
least one associated syntactic rule R', which has some argument list {B±, . . . , B n ). The 
induction hypothesis guarantees that every tree di has at least one CFG-well-formed TL 
syntactic derivation tree t\ = R^[. . .] : C{ associated with it. Note that the induction 
hypothesis says nothing about the categories Cj of these trees. 
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M<A,...A> R'<B,...B> 

In In 




SL/TL Semantic Derivation Tree TL Syntactic Derivation Trees 



Fig. 2. Induction Step: Generating Syntactic from Semantic Derivation Trees. 

The remaining question is whether there is at least one TL syntactic derivation tree formed 
in this way which is CFG- well- formed, i.e. for which (def.^): (i) rule R' is applicable to 
its arguments, and (it) all subtrees i- are CFG- well- formed. Condition (ii) is covered 
by the induction hypothesis. Condition (i) requires that the argument list of rule R' 
matches the syntactic categories of the subtrees t[, . . . , t' n : SynAL(R') = (B\, . . . , B n ) = 
(SynCat(t'i) , . . . , SynCat(t' n )) . From the condition in the theorem we know that there 
is an N-l category correspondence / between the semantic categories and the syntactic 
categories of G' . Because rule R' is associated with rule M, we know that for all 1 < i < n, 
Bi = f(Ai). Since for all 1 < i < n, we also know that tree t\ is associated with tree di, it 
holds that C{ = f(Ai). Since / is a function, it must hold that for all 1 < i < n, Bi = Cj, 
so that the argument list of R' matches the categories of its arguments. Therefore, every 
such rule R' is applicable to its arguments, so that completeness is guaranteed. 

□ 



5.3 Many-to-Many Category Correspondence 

The N-l category correspondence condition is rather restrictive. It implies that a semantic 
category of the source language must be translated into exactly one syntactic catgory of the 
target language. We would like to have a looser category correspondence. For example, 
consider the following grammar rules for translating between English and French noun 
phrases, where French uses agreement on determiners and nouns: 

English Syntax Semantics French Syntax 

R] :NP -> PET N Mi : NP -> PET N R' la : NP' -> DET' m N' m 

R' lb : NP' -» PET' f N' f 

Here we would like to relate semantic category PET to syntactic categories PET' m and 
PET'j, and semantic category jV to syntactic categories N' m and N'f. To be able to do 
so we could allow every semantic category to be associated with a number of syntactic 
categories instead of just one. This corresponds to an N-N category correspondence. 

Definition 9 N-N Category Correspondence 

There is an N-N category correspondence between a semantic component and a syn- 
tactic component of a compositional grammar if and only if there is a function / : 
SemCats — > SynCats such that: 
• Vm G BM Mb € BE m € [6] SynCat{b) € f{SemCat{m)) 
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• VM G SemR Vi? G 5yni2 

(M G [12] A SemType(M) = ({c u . . . ,c n ),c)) => SynType(R) = {(d^.. .,d n ),d), 

where Vi (1 < z < n) G /(cj) and d G /(c) 
For a semantic category C the set of corresponding syntactic categories f(C) is called the 
category correspondence set of C and is denoted C. 

For this new situation we must adjust the completeness condition. Referring to fig. 2, it now 
is the case that each syntactic category Q may be any category in the set f(Ai). As the in- 
duction hypothesis guarantees only one successful translation for each subtree di - and it is 
not known which one - to guarantee completeness is to guarantee that there is a syntactic 
rule R' for every argument list in f(A±) x . . . x f(A n ). This is an unrealistic condition: In 
the English/French example it corresponds to the demand that there must be a French syn- 
tactic rule for all four argument lists (DET m , N m ), (DET m , N f ), (DET f ,N m ), (DETf, N f ). 
But to demand that for example there is a syntactic rule R' that combines a masculine 
determiner DET m and a feminine noun Nf, as this would imply, is nonsensical. The 
underlying problem is that the agreement dependencies cannot be expressed explicitly in 
the CFG grammar formalism. The lesson to be learned from this example is that the 
dependencies between the categories should be taken into account. We present a way of 
encoding information about the dependencies between categories in CFG-based composi- 
tional grammar. To this end we distinguish two kinds of category correspondence. 

Definition 10 Conjunctive/Disjunctive Correspondence Category 

For a compositional grammar, a semantic category N is a conjunctive (correspondence) 
category if and only if for every well-formed semantic derivation tree d of category N, for 
every corresponding category N' in N, there exists at least one corresponding well-formed 
syntactic derivation tree t' of category N'. Any semantic category that is not a conjunc- 
tive correspondence category is called a disjunctive (correspondence) category. Semantic 
categories that have only one syntactic category in their category correspondence set are 
trivially conjunctive categories. 

For example, in the case of the English/French NP rules, the semantic category PET 
corresponds conjunctively to categories DET' m and DET'j (any determiner has both a 
masculine and a feminine form), whilst semantic category N corresponds disjunctively to 
categories N' m and N'j (nouns usually have either masculine or feminine gender). Seman- 
tic category NP corresponds to only one category, NP', and is therefore a conjunctive 
category. 

How can we use this to establish a condition for completeness? The key idea is that some of 
the CFG-well-formed syntactic derivation trees of some category A may be guaranteed to 
translate into at least one CFG-well-formed TL syntactic derivation tree for all categories 
in A, instead of 'for at least one'. Category A is then said to correspond conjunctively to 
the categories in A. As opposed to disjunctive categories, a conjunctive category does not 
require every rule R' to have translation-equivalent variants for all categories in A. Thus, 
the distinction between conjunctive and disjunctive categories allows for a more realistic 
condition on the grammars. 

We adjust the definition of N-N category correspondence, taking into account the distinc- 
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tion between conjunctive and disjunctive categories. 

As for the basic meanings and basic expressions: For every basic meaning m, if its cat- 
egory C is a disjunctive category, there must be at least one associated basic expression 
b' with category C for at least one category C in C. If category C of basic meaning m 
is a conjunctive category, then there must exist at least one associated basic expression b' 
with category C for every category C in C. 

As for the semantic and syntactic rules, for every semantic rule M with type ((Ai, . . . , An), A), 
we establish conditions on the syntactic rules with which they are associated. Again refer- 
ring to fig. 2, when generating a syntactic derivation tree from a semantic derivation tree, 
for subtrees di that have a conjunctive category C we can guarantee a tree t[ for every cat- 
egory in C. For subtrees di that have a disjunctive category C we can guarantee a tree ti 
for only one category in C, and we do not know which one. Therefore, we must guarantee 
that for every tupld]] D G ^ j d Ai of the syntactic categories corresponding to disjunc- 
tive categories of M, there exists at least one syntactic rule R' with type {{B\, . . . , B n ), B) 
such that: 

• The tuple of the syntactic categories corresponding to the disjunctive categories 
of the argument list of M is equal to D: (Bi \ i G Id)=D. 

• Every syntactic category Bi that corresponds to a conjunctive category Ai of 
the argument list of M is in the category correspondence set of A^. Mi G I c Bi £ Ai. 

• In addition, the resultant category A of semantic rule M must be taken into 
account. If this is a disjunctive category, then it suffices if the resultant cate- 
gory B of the syntactic rule R' is in A. If category A is a conjunctive category, 
then there must be at least one syntactic rule R 1 with resultant category N for 
all categories N in A. 

Using this condition we again obtain completeness: 

Theorem 2 CFG Completeness for Many-to-Many Category Correspondence 

For any CFG-based compositional grammar pair (G, G'), compositional translation from G 
to G' is complete if (i) the grammar pair is homomorphic from G to G', and (ii) there is 
an N-N category correspondence between the semantic and the syntactic categories of G', 
where every semantic category of G' has been declared conjunctive or disjunctive and the 
sets of categories of G' satisfy the condition described above. 

Because of space limitations we do not include the proof; we trust that the description of 
the condition above gives the reader an insight into how the proof can be given. 

Example Returning to the English/French example discussed earlier, we declared PET 
a conjunctive, N_ a disjunctive, and NP a conjunctive category. Checking the condition 
formulated above, this amounts to the requirement that for every tuple D in {{N^), (N^)}, 
there exists a syntactic rule R' such that {Bi \ i G Id)=D and Vi £ I c Bi G Ai, which is 
indeed the case. 

^Consider the following auxiliary definitions. For any argument list (Ai, . . . , A n ), define sets I c and Id 
as consisting of the indices of its conjunctive and disjunctive categories, respectively. Define (Ai \ i 6 I c ) 
and (Ai | i € Id) as the corresponding subtuples. 
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6 Conclusion and Future Research 



In this paper we presented the issue of completeness for compositional translation, and 
discussed how conditions for compositional translation could be found. In section |5| we 
examined the completeness issue for context-free grammars. We established completeness 
conditions for grammars with an N-l category correspondence. As this condition is rather 
restrictive, we relaxed this condition to an N-N category correspondence condition. The 
first attempt however led to unrealistic conditions on the grammar rules, so that it was 
necessary to introduce the distinction between conjunctive and disjunctive categories. We 
adjusted the N-N category correspondence condition accordingly, and obtained a com- 
pleteness condition for grammars with an N-N category correspondence. 

The central issues in ongoing and future research are (i) the completeness issue for some 
other grammar formalisms, (ii) the algebraic formulation of completeness, and (Hi) poly- 
nomial compositional translation. 

(i) Completeness for Other Grammar Formalisms The definite-clause grammar 
formalism (DCG, see e.g. [ Pereira and Shieber 1987[| ) extends the CFG grammar formal- 



ism with attributes added to the nonterminals. Attributes have a variety of uses, one of 
the most prominent being the enforcement of agreement relations. As for the complete- 
ness condition for DCG, we assume the same conditions on the nonterminals as we did 
for CFG. In addition, we formulate restrictions on the use of attributes. A proof has been 
established for completeness of grammars that satisfy these restrictions. 
Future research will also address the completeness issue for Tree- Adjoining Grammars. 
Tree-Adjoining Grammars are interesting because they are somewhat more expressive 
than CFG's (they are so-called mildly context-sensitive), and it enables expressing lin- 
guistic phenomena such as long-distance dependencies. 



(ii) Algebraic Formulation of Compositional Translation - Compositional gram- 
mar, compositional translation and the completeness issue can be formulated algebraically. 
Such an algebraic formulation has a number of advantages: (i) it abstracts away from 
the details of specific grammar formalisms, thus revealing the essentials of compositional 
translation and completeness, (ii) this abstraction provides a basis for the comparison 
of different grammar formalisms, and (Hi) an algebraic formulation gives access to well- 
investigated mathematical theory, the results of which may be readily carried over. I hope 
to use the algebraic formulation as a basis for the investigation of the combination of the 
use of features and completeness. For other work on algebraic description of natural lan- 
guage, see | Janssen 1986| , Hendriks 1993 1. An algebraic view on compositional translation 
is presented in [Rosetta 1994, Ch.19]. 



(iii) Polynomial Compositional Translation - Another line of work is concerned 
with an extension of the method of compositional translation for grammar formalisms 
that use only concatenative operations. The basic idea here is a generalization of the unit 
of translation-equivalence from single elements to combinations of these (polynomials). 
This improves 'translation power', as it becomes possible to overcome all kinds of transla- 
tion problems due to structural divergencies between languages. For example it becomes 
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possible to relate a structure like [A [B C]] with a structure like [A' B' C'\. I hope to show 
that, as polynomially derived algebras are algebras again, completeness conditions found 
for compositional translation will carry over to polynomial compositional translation. 
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